Skip to main content

Staging

The staging layer is where all of our data is initally bought into the project. These are the basic building blocks that we will use to create our project.

Example Folder Structure

models
└── staging
├── stripe
│ ├── stg_stripe__sources.yml
│ ├── stg_stripe__models.yml
│ ├── stg_stripe__transactions.sql
│ └── stg_stripe__accounts.sql
└── shopify
├── stg_shopify__sources.yml
├── stg_shopify__models.yml
├── stg_shopify__orders.sql
└── stg_shopify__stock.sql

Structure Best Practices

  • ✅ Subdirectories ~ based on source system. This makes it easier to keep the complexity of the project low as it grows. They should not be based on business grouping at this stage.
  • ✅ File Names Structure ~ file names should be a constant pattern and each file name must be unique across the project. stg_{source}__{entity} is a good format as it easily allows you to distinguish between the data source and the entity as well as avoiding duplicate file names.

#npm run start

Models

Transformations

In staging, we generally tend to do a set amount of transformations. The purpose of any transformations done here are so that they will follow through downstream to any other models that we will create.

StagingPurpose
RenamingRenaming of any tables or columns
Type CastingIf we need to format the any of the data points to a specific format
Basic computationsExamples would be rounding, timezone changes
CategorizationFor example using a case statement to group variables together

Materialisation

Staging will be materialized as views. This is because they are not meant to be the final data asset themselves but just building blocks. This also means any downstream models are always going to get the freshest data possible. Also we will avoid wasting any space in the warehouse on models that will not be queried by any end users.

# dbt_project.yml
models:
your_project:
staging:
+materialized: view
important

DRY code: Don't Repeat Yourself - staging models helps us keep our models DRY. This saves us from wasting code, increasing complixity and compute to do the same transformations multiple times.